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o : 

We introduce and study an artificial neural network, inspired by the probabilistic Receptor 
Affinity Distribution model of olfaction. Our system consists on A*' sensory neurons whose outputs 
' converge on a single processing linear threshold element. The system's aim is to model discrimi- 

nation of a single target odorant from a large number p of background odorants, within a range of 
odorant concentrations. We show that this is possible provided p does not exceed a critical value pc, 
^ ' and calculate the critical capacity Oc = Pc/N. The critical capacity depends on the range of con- 

I , centrations in which the discrimination is to be accomplished. If the olfactory bulb may be thought 

^ • of as a collection of such processing elements, each responsible for the discrimination of a single 

odorant, our study provides a quantitative analysis of the potential computational properties of the 
olfactory bulb. The mathematical formulation of the problem we consider is one of determining the 
capacity for linear separability of continuous curves, embedded in a large dimensional space. This 
is accomplished here by a numerical study, using a method that signals whether the discrimination 
task is realizable or not, together with a finite size scaling analysis. 
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I. INTRODUCTION 



The basic machinery for olfaction, our ability to smell, is an array of a few hundred different types of sensory 
' neurons. Each of these expresses molecular receptors, that belong to a single type. When this small neuronal assembly 
fSJ ' is exposed to external stimuli, its cooperative response is capable to detect and recognize a wide variety of odorants 
and to measure their concentrations. We use the terminology "odorant" to describe any chemically homogenous 
substance (ligand) which elicits a response from the olfactory system. 

The response of the array of neurons to any particular odorant is determined by the responses of the individual 
constituent neurons. This response is, however, governed by the extent to which the receptors expressed by the 
particular neuron bind the odorant, i.e. by the affinity K of the neuron's receptors to the odorant. According to 
a recently proposed model [Q, these affinities can be viewed as independent random variables, drawn from a single 
■ receptor affinity distribution (RAD), denoted by ipiK). Once a set of affinities (for all odorants and all sensory 
Ch ', neurons) has been generated, the response of the entire sensory assembly to any odorant is determined. 
I ■ This information is transferred from the sensory neurons to the olfactory bulb, onto which the axons of the sensory 
' O ' neurons project. They form synapses on secondary neurons (mitral and tufted cells). This integration of the sensory 
C input, that takes place in the olfactory bulb, forms the first step of the information processing that takes place in the 
olfactory pathway. Interneurons of two major types (periglomerular and granule cells) are believed to play a role in 
computing the pattern transmitted from the olfactory bulb to higher brain centers. 

In this paper we evaluate, on the basis of a very simple model, some of the potential computational characteristics 
of the olfactory bulb, as it performs this initial integration. We hope some of our quantitative results could be 
biologically relevant. Our simple model for the sensory array and a single processing unit is depicted in Fig. |^. 
d The model we introduce is, however, interesting also from a mathematical point of view. The problem of Linear 

Separability (LS) of points in N dimensional space has received considerable attention since the 19th century [|). 
In the mathematics literature Cover studied the problem of LS of independent dichotomies using combinatorial 
methods |l0|. In computer science the perceptron, introduced by Rosenblatt § and analyzed in detail by Minsky 
and Papert]ll| , gave a major boost to the field of neural networks. More recently, by introducing Statistical Mechanics 
techniques Gardner jsj extended Cover's results to cases where there are correlations between the points that have to 
be linearly separated. 

We generalize the problem of separating (zero-dimensional) points, to the separability of (one-dimensional) strings 
or curves, embedded in A^-dimensional space. In the context of our problem the curves that need be separated are 
parametrized continuously by the odorant concentration. 
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In principle one can address the separability of curves by placing a discrete set of points on each curve, thereby 
mapping the problem onto the previously solved one, of separating points. One should note, however, that points that 
lie on the same curve are not independent; in fact they are correlated in ways that render the previously developed 
analytical methods unapplicable. Therefore we present an extensive numerical analysis of the capacity of this special 
neural network, of N sensory neurons that provide input to a single processing unit. The capacity we calculate is 
interpreted as follows. The sensory system is exposed to p + 1 odorants, one at a time. One of these is the "target" ; 
the aim is to distinguish the target from all the other p = aN odorants that form a "noisy olfactory background" . 

The model, based on a single layer perceptron, is introduced and discussed in detail in section 2.1 . Then we turn 
to describe the method we have developed in order to determine numerically the capacity. To do this we had to adapt 
and use several different techniques. One of these, a learning algorithm introduced by Nabutovsky and Domany 
is described in Sec 2.2. This algorithm, like all other perceptron learning rules, finds the separation plane (if the 
problem is LS); however, unlike other learning algorithms, it provides a rigorous signal to the fact that a sample of 
examples is not LS. 

Another technique we had to adapt to our purposes is finite size scaling (FSS) analysis of the data. The main 
results are presented in Sec 3 as curves of capacity as a function of odorant concentration in the thermodynamic 
(N — !■ oo) limit, obtained by extrapolation, using FSS, from data obtained at a sequence of N values. This large 
N limit is quite natural from both practical and theoretical points of view. In practice, for N of the order of a few 
hundred, the results can hardly be distinguished numerically from those at the N oo limit. As to the theoretical 
side, the situation in this limit is much cleaner and easier to analyze. The final section 4 contains a critical discussion 
of the results from a biological point of view. 

Our central finding is summarized in Fig |6[ if we fix the range of concentrations in which the system 
operates, and increase the number of background odorants, we will reach a critical number pc beyond 
which the system fails to discriminate the target. This critical number is proportional to the number of 
sensory neurons N, i.e. p^ — ctcN, and it decreases when the concentration range increases. 



II. COMPUTATIONAL MODEL 



A. Odorant identification as Linear Separation of Curves in N-dimensional space 

The simple neural assembly that is considered here consists of a single secondary neuron, which receives inputs 
from an array of N units that model the sensory neurons. The single secondary neuron represents a "grandmother 
cell", whose task is to detect one particular "target" odorant, labeled 0. The sensory scenario we consider allows 
exposure of the neuronal assembly to a single odorant, which may either be the target odorant or one of p = aN 
background odorants. The odorant provides simultaneous stimuli to the N sensory neurons. The aim of the single 
secondary neuron is to determine whether the odorant that generated the incoming signal from the sensory array is 
the target odorant or not. We assume that all odorants, background and target, are presented to the sensory array 
in concentrations H that lie within a range 

-^min H ^ ^max (1) 

We pose the following, well defined quantitative question: 

What is the maximal number pc of different background odorants that our neuron can distinguish from the 
target, for any concentration within the prescribed range ? 

To sharpen the question, we put it in a more precise mathematical form. Consider /i = 1, 2...p background odorants 
with respective concentrations H^^ in the range (§). Odorant /i is characterized by the i — 1...N affinities K^^ of the 
N receptors. According to the RAD model, these affinities are selected independently from a distribution ip{K) [1]. 
All our numerical results were obtained using for ^{K) the form (note: K >Q) 

K K"^ 

V(if) = ^exp(- — ) (2) 



The average and variance of this distribution are given by 



(^ip{K)^ = 0.65cr (3) 
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The distributions suggested in were Poisson and binomial. With regard to the computational limitations of our 
model, the important idea behind the RAD model lies not in the exact form of the distribution, but on the fact that 
the affinities can be thought of as independent random variables. The main computational features of our model will 
not be altered as long as the distribution has the following features: it is zero for negative affinities and it has finite 
first and second moments. We have used ipiK) since it satisfies the previous constraints and is easier to deal with in 
analytical calculations. 

When receptor i is exposed to odorant /i, at concentration its response is given by 

= fiKtnn (4) 

where f{x) is a sigmoid shaped function; we use 

f = x/{l + x) (5) 

throughout this paper. 

The value taken by the affinity sets the particular concentration scale at which odorant /i affects the zth sensory 
neuron. From this point on we set cr = 1 in eq. (^); this means that the concentrations are measured in inverse units 
of the parameter a. 

The set of values {S'f } = {S'f , 5*2 , ■■■S'^} constitute a vector of signals S'^, generated by the entire sensory array, 
when it is exposed to odorant fi. The {S^} serve as inputs to our secondary neuron, which we model as a linear 
threshold element or perceptron; its output signal is given by 

s'^ = sign WiSf ^ = sign (w • S") (6) 

The simple neural network described above is schematically presented in Fig. |l|. The sensory neurons are represented 
by boxes and the secondary neuron by a circle. 




1^ 



t t tn 



1^ 
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FIG. 1. Schematic representation of our model. A'' sensory neurons, represented by boxes, are exposed to odorant fi, present 
in concentration H'^. Each sensory neuron i is characterized by a set of affinities K^; the sensory input elicits from neuron i 
a response , as given by eq. (^^. A weighted sum of the A'^ sensory responses serves as the input of the secondary neuron 
(circle), whose output s^, generated in response to odorant /i, is given by eq. ml). 



We require the output of this neuron to differentiate the target odorant from the background, i.e. yield 

-I for /i = 1, (background) 
-I for /I = (target) 



(7) 
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for any odorant concentration in the allowed range 

To understand the geometrical meaning of this requirement, note that when the concentration of odorant /x is varied 
in the allowed range (H), the corresponding vector S'^ traces a curve (or string) in the A'^-dimensional space of sensory 
responses. The requirement means that there exists a hyperplane, such that the entire curve that corresponds to 
the target odorant lies on one side of it, while the curves that correspond to all p background odorants lie on the 
other side. This explains our statement, made in the Introduction, that the problem we solve deals with the Linear 
Separability of curves. 

We show that a solution to this classification problem can be found, provided p < Pmax — oicN . We estimate the 
critical capacity Oc numerically. This is done by extrapolating results obtained for various values of TV, using finite 
size scaling techniques, to the limit N oo. The value of etc is evaluated as a function of the limiting odorant 
concentrations . 

In order to obtain these results using existing methodology, the most natural and straightforward thing to do is 
to place a discrete set of ^ = 1,2, ...,M points S''^ on each curve, corresponding to different concentrations, and to 
require that the M points that lie on the curve of the target odorant are linearly separable from the PM points that 
represent the background. That is, equations (0) become 

„^ _ J -1 for ^ = 1, ...p; C = 1, •■•^ (background) , . 

\+lfor^ = 0; C = I,-- (target) 

This raises the technical question of how many (discrete) representatives of the same odorant should be included 
in the learning set. We show below, that while the critical number of odorants Pmax, scales linearly with iV, the 
number of representatives of a single odorant, M, has to grow at least as fast as iV^. This ensures that increasing M 
further does not change the results of the calculation (e.g. the value of Pmax) and hence the M discrete points indeed 
represent correctly the continuous curves on which they lie. 

Our problem has been turned into one of learning M{p + 1) "patterns", that constitute our training set C. For 
technical reasons it is convenient to introduce and work with normalized patterns, 

= 4^ (9) 

y'(SK)2 

with C = I. . . . M running over the M discrete concentrations and fi — 0, 1, ... .p over all odorants. Note that we 
also multiplied each pattern S^^ by its desired output, s^^; after this change of representation the condition of linear 
separability (||) becomes 

sign (w • f^'^) > for^ = 0,l,...pandC = 1,2,...M (10) 



B. The Learning algorithm 



The question posed above, whether the target odorant can or cannot be distinguished from the background, has 
been reduced to the following one: is there a set of weights Wi, i — I, ...N, for which all M{p + 1) inequalities ( p^ ) 
are satisfied? This problem is of the type studied by Rosenblatt ||], and is an example of classification by a single 
layer perceptron. A solution exists if one can find a weight vector w* (that parametrizes the perceptron) such that 
for all the patterns in the training set C the "field" 

h^"^ = ^^'^ • w* > 0, (11) 

i.e. the projection of the weight vector w* onto all patterns ^''^ is positive. We wish to determine the size of the 
training set £, i.e. the number of background odorants p, for which a solution w* can be found. This is done by 
executing a search for a solution w* by means of a learning algorithm. There are several learning algorithms (e.g. 
Rosenblatt ||], Abbott and Kepler Q ) in the literature; all are guaranteed to find such a weight vector, in a finite 
number of steps, provided a solution exists. If, however, the problem is not LS and a solution does not exist, most 
learning algorithms will just run ad infinitum. An exception to this is the algorithm of Nabutovsky and Domany 
(ND) [pj which detects, in finite time, that a problem is non-learnable. This is a batch perceptron learning algorithm, 
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presenting sequentially the entire training set £ in one "sweep" and repeating the process until either a solution is 
found or non-learnability is established. We found that this algorithm is efficient and convenient to use (see [D for 
other algorithms that detect non-LS problems). Q 

ND introduced a parameter d which they called despair , which is calculated "on line" in the course of the learning 
process, d is bounded if the training set C is LS. Since the ND algorithm can be shown to either find a solution w*, or 
transgress the bound for d in a finite number of learning iterations, d effectively signals if the learning set C fails to be 
linearly separable. The theorem they proved can be easily extended to the distribution of examples in our problem 0. 
We introduced a halting criterion, which is probably more stringent than necessary, since no attempt has been made 
to determine an optimal lower bound. In figures ^ and ^ typical evolutions of the despair are shown for an LS case 
and for a non-LS case, respectively. The behavior of d is strikingly different in the two cases, showing that indeed d 
is a good indicator of learnability. In the learnable cases d grows linearly with the number of learning sweeps until a 
solution is found (and the curves terminate). In the non-LS cases d grows exponentially with the number of sweeps 
and would continue to grow; the process is halted when it's value exceeds a known bound, that must be satisfied if 
the problem is LS. 
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FIG. 2. Despair d as a function of the number of sweeps for TV — 10, Hmax ~ 50, Hmin = 0.5 and p = 25 ; learnable cases 
are shown in (a) while those that were not learnable are shown in (b). Note the huge difference in both vertical and horizontal 
scales. 



We now describe the ND algorithm used in the simulations. The patterns of the learning set C are presented one 
at a time (one cycle constitutes a sweep). ND have shown |^ that for binary valued patterns (^i = ±1), i.e. patterns 
on vertices of a unit hypercube, an upper bound dc exists iff the training set is LS. On the other hand the dynamics 
is shown to take d beyond that bound in a finite (linear in A'') number of iterations unless a solution exists and the 
algorithm halts. Initialize the process with d = 1, w = Go to the next example. If it is correctly classified, do 
nothing to the current weight vector and go to the next example. Once a misclassified example ^''^ is found, update 
the weight vector as well as the parameter d, according to 



(12) 



dnew — , (13) 

VI + 2r]h^< + jf 

•q is not just a learning rate parameter but an effective modulation function, chosen in order to maximize the increase 
of the despair as 



^The fact that we use a learning procedure to establish the boundaries of linear separability doesn't imply and is unrelated to 
any possible plasticity of the olfactory bulb. The algorithm is being used only to either find a solution or show that it doesn't 
exist. 

^The original ND algorithm was designed for binary vectors, i.e. pointing at the corners of a N-dimensional hypercube. It 
can be shown that the theorem can be extended to vectors on the unit sphere. 
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(1 - ht'i/d) 



(14) 



The learning dynamics halts if aU patterns are correctly classified or alternatively, if the value of d exceeds an upper 
bound, given by 



d > dr — 



]\[(N+l)/2 

2^-1 



(15) 



This is guaranteed to happen in at most Nd'^ steps. 
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FIG. 3. The estimated probabilities saturate when M, the number of representatives of a given odorant, exceeds Mc ~ A*'^. 
For brevity we denote P{Hmin, Hmax,p, N) by P{h,p/N). In all cases p = 25 odorants were used with Hmin = 0.5. 



III. NUMERICAL EXPERIMENTS 



Since there is a large number of parameters that are to be varied, we present first a detailed description of the 
manner in which we deal with every one of them. 

There are two random elements in our studies. The first is in the selection of M concentrations for each odorant, 
within the range (|^); the second is the choice of K'j^, the affinity of receptor i to odorant /i, selected at random from 
the distribution V'(-f^) of eq. (^. For every choice of the remaining variables we generate an ensemble of experiments 
and average the object we are measuring over these two random elements. We select La times the set of affinities and 
for each of these perform L^. times the random selection of concentrations. 

The object we wish to estimate numerically is the probability P, that the p curves described in the Introduction are 
LS. To this end we place M points on each curve and measure the corresponding probability P{Hmim Hmax,P, N; M). 
As we will see, for large enough values, M > Mc, this probability becomes independent of M; beyond Mc the set of AI 
discrete points represents the corresponding curves faithfully and hence the limiting value P{H„iin, Hmax,P, N; M > 
Mc) is our estimate for P{Hjnin, Hjnax,P, N). 

Finally, we are interested in this function in the large N limit, i.e. when N oo and p oo, while a ~ p/N is 
fixed. This limit is obtained by extrapolating our finite results, using finite size scaling methods. 

Our first task is to determine how Mc scales with N; that is, how dense a set of concentrations is to be used so 
that M discrete points represent accurately the continuous curves S'^{H^) of eq. (^)? 
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A. Scaling of Mc 



We choose values for N (number of receptor cells), p (number of odorants) and Hmin, Hmax (hmiting concentrations). 
We also set some value for M, the number of concentrations by which every odorant is represented {M will will be 
varied) . 

We proceeded according to the following steps: 

1. Draw from the distribution ^{K) a set of afhnities iff for all N receptors and p odorants. 

2. Generate for each odorant M concentration values, from a uniform distribution in the allowed range 
Hjnin < H < Hjnax and construct the set {C^^} of normalized patterns. 

3. Rmi the ND learning algorithm until it stops; register whether the set {i^'*^} was LS or not. 

Steps 2,3 are repeated times for each set of afhnities; the whole process 1 - 3 is repeated for La different sets of 
affinities. We used La = 100; increasing it further made no difference. With such a value of La the results did not 
depend on Lc', having tried 1 < Lc ^ 10 we used Lc = 1 in our simulations. 

At this point we have L^ ■ La experiments, out of which a fraction of P{H„iin, Hmax,P, N, M) cases were linearly 
separable. Keeping Hmin, Hmax,P, N fixed, we increase M and repeat the entire process, obtaining the probability 
functions P{Hmin, Hmax,P, N, M), that are plotted in Fig. || versus M/N'^. Clearly the curves saturate when 
M > Mc ~ N'^ ■ From this point on we have fixed the value of M at M ~ iV^ . This numerical result can be estimated 
by using the analysis of Gardner and Derrida Q for the capacity of biased patterns, using for the "magnetization" 
m the value m cx 1/iV. This gives, in addition to the leading behavior M^, « iV^, logarithmic corrections as well. We 
cannot rule out the possibility of such logarithmic corrections to the scaling we found here. 

B. Measuring the probabilities P{Hmax,p, N) 

In all our experiments we fixed the value of -ffmiji = 0.5 and hence the dependence of the probability on this variable 
has been suppressed. For various values of TV, p and Hmax we calculate P(HmaxiP, N) in the manner described above. 
Keeping TV and Hmax fixed, we increase p. For p << N we have P{Hmax,P, N') « 1 and the probability of LS decreases 
as p increases. We stop increasing p when P{Hmax,P, N) becomes smaller than some e. 

The variation of P{Hmax,P, N) vs p/N is presented, for three values of Hmax and four values of N, in Fig. ^. The 
results presented in these figures are discussed in the next subsection. 

We should mention here that for large N we used a heuristic modification of the ND halting criterion, to label a 
problem as non-LS. Typical evolutions of the despair parameter are shown in figures ^ Each curve represents the 
history for a single learning set. Notice the huge difference in scales for the learnable and the unlearnable cases. The 
wide separation in final values of d suggests that a more practical, e.g. smaller, upper bound be used. For = 30 
(the largest value treated here) we used a different halting criterion in order to escape from the need to reach an 
exponentialy high upper bound. After a small number of successful trial runs (that did produce linear separabitiy) 
we identified the highest value of the despair dm that was reached for a learnable set. This value was used to define 
our new heuristic halting criterion, dbound — N dm- 

C. Finite Size Scaling Analysis 

As expected, for small a = p/N the probability for linear separability is close to 1, and it decreases as a increases. 
The curves obtained for fixed Hmax become sharper as A^ increases. 
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FIG. 4. (a) Probability that a learning set is LS as a function of a = p/N , for maximal concentration _ff„ 

Hmin = 0.5 (b) Hmax = 70 and Hmin = 0.5 (c) Hmax = 100 and Hmin = 0.5 



= 50 and 



Note that curves obtained for different N values cross at approximately the same value of a. Similar behavior of the 
corresponding probability functions has been observed for random uncorrelated patterns JlO| . Notice, however, that 
the crossing point is at some probability P < 1/2. Similar curves, obtained for other architectures, such as the parity 
and commitee machines crossed at P > 1/2. If there is a sharp transition in the thermodynamic limit {N —^ oo), 
these curves should approach a step-function, with 

p( TT TT n\ — j ^ ^ Olc{Hmim Hmax) ^^p,\ 

That is, for a below a certain Uc {Hmim Hmax) , a learning set will be LS with probability one and conversely, it will 
be LS with probability zero for a > ac {Hmin, Hmax) ■ The manner in which such a step function is approached as 
TV — > oo can be described by a finite size scaling analysis (e.g. ||^). 

For each value of Hmax (keeping Hmin fixed) we tried a simple rescaling of the a variable, with two adjustable 
parameters, ac and v] 

y ^ (a ~ ac)iV- . 

For the proper choice of a^. and v we expect data collapse; that is, curves obtained for different values of N are 
expected to fall onto a single function, provided P{Hmax, a, N) is plotted versus the scaled variable y. As can be seen 
on Figures |^,b and c, this expectation is borne out; the evidently good data colapse indeed substantiates the idea 
of a sharp transition at Uc- As N increases, the function P{Hmax, ct, N) becomes increasingly sharper; its width near 
Qfc decreases at a rate governed by the exponent v. 
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FIG. 5. (a) Probability that a learning set is LS as a function of y = (a — ac)N ^ , for maximum concentration Hmax = 50 
and Hfjiin = 0.5 (b) Hmax ~ TO and Hmin = 0.5 (c) Hmax ~ 100 and Hmm ~ 0.5 The result of a least squares fit are 

{a.)Hmax = 50, = 4.8, Qc = 2.8,(b) Hmax =1Q,V = 2.85, Oc = 2.25 (C) Hmax = 100, U = 2.75, Qc = 2.1 

Finally, we present in figure ^ the behavior of Uc as a function of H,nax (for fixed Hmin)- As Hmax increases, 
separation of the curves becomes an increasingly difficult task and hence adHmax) decreases. We find that it 
saturates at a low value close to amin — 2, which is exactly the Cover result. This interesting point is explained in 
the Appendix. 

Note that even though we deal here with linear separability of curves, which one would expect to be a more difficult 
task than separating points, we found that our ac exceeds the value derived for points, ac = 2. The reason is that 
this is the critical capacity for separating random, independent points; the curves we are trying to separate are not 
independent of each other. In fact by construction we have > for all the background odorants; hence all these 
curves lie on one side of an entire family of planes. The target odorant, which also satisfies Si > 0, should lie on the 
other side of the separating plane. 

The curve adHmax) is, in effect, a phase boundary; on one side we have a "phase" in which the problem is LS, while 
on the other (high a) region it is not. We present now a brief description of the manner in which linear separability 
breaks down as we cross this phase boundary by increasing Hmax at fixed a. 
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FIG. 6. Phase diagram in the a, Hmax plane for H,nin = 0.5. The curve separates the a, Hmax plane into the regions where 
the network can distinguish the target odorant (below), and where it cannot (above). Below( Above) the curve a learning set 
is(not) LS with probability one in the thermodynamic limit. The horizontal dotted line is alphas ~ 2. 



D. Breakdown of LS near phase boundary 

The manner in which LS breaks down as Hmax increases beyond the phase boundary is nicely illustrated by the 
set of figures || and ||. 




-0.1 
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FIG. 7. The curves representing the p background odorants and the target odorant (marked by an arrow), in a LS case, 
a < flc, for Hmax = 50, Hmin = 0.5, A'' = 20. In figure (a) the curves are projected onto a randomly selected plane; in (b) onto 
a plane Q (see text), that contains the weight vector w*, determined by the learning algorithm. The horizontal dotted line in 
figure (b) demonstrates linear separability of the target from background. 



Consider the p curves, in an A'^-dimensional space, which represent the odorants, in a linearly separable case. We 
present in figure ^a) a projection of these curves onto a randomly chosen plane. One of these (indicated by an arrow) 
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is the target odorant; it seems to be entangled with the other curves. The point at which all curves seem to converge 
corresponds to the maximal concentration. 

The purpose of the learning dynamics is to find a particular direction w, along which one is able to separate the 
target curve from the others. Denote by V the hyperplanc that passes through the origin and is perpendicular to w; 
this is the linear manifold that separates the target from all the background curves. Select now any plane Q, that 
contains w, and project all curves onto Q; this produces Fig. 0(b). The horizontal dotted line shown here is the 
intersection of the hyperplanc V with the plane Q. The projected background odorant curves lie on one side of this 
line and the target on the other. The situation depicted here is LS. 

Consider now what happens when we turn the problem into non - LS by increasing Hmax beyond the phase 
boundary. As we increase the maximal concentration, the target odorant 's curve penetrates to the "wrong" side of 
the hyperplanc V. A picture of this situation is shown in Fig. p|(a). 




FIG. 8. The same situation as in Fig. 7, but with Hmax increased beyond the limit of learnability. (a) Projecting the curves 
onto the same plane as for the LS case, we see that the target penetrates to the "wrong" side of the broken line, (b) Further 
attempts to learn new w will fail. 

This is a non LS problem - which means that no matter how long we run our learning algorithm, we will never 
find a hyperplanc V that separates the target from all the background. If nevertheless we keep running our learning 
algorithm, the direction of our candidate for w will keep changing as we "learn" , but since the critical capacity curve 
of figure ^ has been crossed, no amount of further learning will produce a separating plane. The density of points 
near the high concentration limit is much larger than for low concentrations. Hence further learning will perhaps be 
able to separate the target from the background at high concentrations - but then separability breaks down at low 
concentrations (see Fig. ||(b)). 



IV. SUMMARY AND DISCUSSION 



In the olfactory bulb of most vertebrates, each secondary neuron (mitral or tufted cell) receives input from only 
one glomerulus, which in turn is innervated, in all likelihood, by axons stemming from olfactory epithelial sensory 
cells that all express the same olfactory receptor protein. Thus, the grandmother cell modeled here may not simply 
represent a mitral or tufted cell. However, when the network of periglomerular and granule cells (interneurons) is 
taken into account, then it is fair to state that each mitral cell receives (indirect) input from a large number of different 
olfactory receptor types. Thus, the present analysis may be relevant to the kind of neuronal processing that takes 
place in the first neuronal relay station of the olfactory pathway, the olfactory bulb. Alternatively, it may represent, 
in abstract fashion, information processing that takes place both in the olfactory bulb and at higher olfactory central 
nervous system centers. 

Previously, several studies have been published that analyze neuronal networks for the olfactory System |l^] |l^] 
lHI 101 [@ ■ However, none of these was based on a quantitative model for the affinity relationships within the entire 
olfactory receptor repertoire. Here, we use the Receptor AfHnity Distribution (RAD) model, which was developed, 
based on general biochemical considerations, for receptor repertoires, including that of olfactory receptors. The power 
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of this approach is in utihzing a global knowledge about the repertoire to analyze the fidelity of discrimination among 
odorants. It has been pointed out in the past, that the RAD model may be used to analyze the signal to noise ratio 
in systems in which specific binding to a receptor has to be distinguished from the background of numerous other 
receptors which constitute "non-specific binding" 0] Here, we apply a similar concept to an analysis of signal to 
noise discrimination in the case of a neuronal network whose input stems from a receptor repertoire. 

The results presented here suggest that for a fixed number of background odorants there is a maximal odorant con- 
centration beyond which odorant discrimination becomes impossible. This is not surprising, since olfactory receptors 
are saturable, and at very high concentrations weak affinity receptors as well as high affinity ones will generate com- 
parable signals. However, it is noteworthy that despite the fact that information capacity for odorant discrimination 
rapidly declines as odorant concentration goes up, the presently analyzed network is still capable of discrimination 
even at concentrations for which H{K) is of the order of a few hundred (where {K) is the average affinity). 

The model network consists of N sensory neurons, each of which is characterized by a set of affinities to a number 
of odorants. When any particular odorant, /j,, is present, sensory neuron i produces a (nonlinear) response, 5f . These 
responses constitute the inputs to a single processing unit (secondary neuron), which performs weighted summation 
of all the N inputs. The secondary neuron's output is the sign of this weighted sum. The aim of this single processing 
unit is to identify one single odorant separate it from all the others that may be sensed by the system. This secondary 
neuron plays the role of a "grandmother cell" for a particular target odorant. An assemply of Pq such secondary 
neurons may constitute, together with the sensory neurons, a system that is able to clearly identify the presence of 
Pq target odorants, from a background of P odorants. 

We posed a well defined quantitative question: given that each odorant may appear with a concentration that 
lies in a certain range, Hmm < < Hmaxi what is the maximal number of background odorants Pc = ol^N , from 
which a single target can be separated with probability 1? The answer is summarized in Fig. 5, where the critical 
capacity, is plotted vs. Hmax- The result is obtained in the limit of large N (i.e. many sensory neurons - in fact, for 
iV = 100 this result should already give excellent precision). For a dynamic range of Hmax/Hmin of about 100 we 
find ac ~ 2.5. That is, for say N = 300 sensory neurons we can distinguish the target from about 750 background 
odorants. Hence if we assemble 750 odorants and appoint a grandmother cell for each, we will be able to identify 
them one by one. 

In order to get this quantitative answer we had to generalize an old problem, of Linear Separability of P points on 
an iV — 1 dimensional hypersphere, to the new problem of linerly separating P curves that lie on the same hypershere. 
We have shown that in order to represent a curve by discrete points that lie on it, we have to place M cx N'^ points on 
each curve. The results were obtained by a perceptron learning algorithm that signals when a problem is unlearnable, 
i.e. non-linearly-separable. 

Results obtained at various values of N were shown to collapse when plotted as functions of properly defined scaled 
variables, which allowed easy extrapolation to large values of N. 



APPENDIX 



The behavior of the phase boundary for large concentrations etc — > 2 (figure 5) is quite surprising since the network 
may be expected to enter a totally confused state due to the saturation of the nonlinear sensory neurons. This could 
be expected to lead instead to ac 0. That the Cover result (ac = 2) is recovered in the high concentration regime 
can be in fact be understood by the following argument. 

We first calculate the probability P{S), that a sensory unit gives a response S to the presentation of an odorant in 
the range (1), by 

P(S) = {d{S-f{HK))) (17) 

where the average is taken over possible concentrations H uniformly distributed in range (1) and according to the 
RAD model, over the affinities, ip{K) of equation (2). / is given by equation (6). The integrals lead to 



{P^max Hmiri) \ \^Hniax (1 J \(7Hmin (1 

where Erfc{x) = exp (^—u^^ du / ^/n is the complementary error function This probability has one peak which 
sharpens and moves to higher values of S as Hmax grows. However at the very ends of the interval, 5* = lor 0, the 
probability is zero. That P{1) = for every H^ax is the source of the surprise. The peak which concentrates all 
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the probability , gets arbitrarily close to S* = 1, as the concentration increases, but never makes it to the extreme of 
the interval. In fact Speak ~ 1 — c/Hmax- Therefore the components of the vectors S^^ will be with overwhelming 
probability at the peak position, which can be written as Sf = 1 — ef with all e'^{Hmax, K^) small but strictly 
positive. Neglecting second order terms in e the normalized patterns will then be: 

(1 -ef + e'^)/VA^ 

Therefore the S^^ vectors are unbiasedly distributed around (1, 1, . . . 1) /\/]V. We are taken back to the original 
Cover-Gardner problem of separating p unbiased patterns with a hyperplane and the result ac = 2 is no longer a 
surprise. This argument doesn't deal with the asymptotic behavior of the capacity in the presence of any kind of 
noise. In that case the naive expectations that ac for Hmax oo are probably borne out. 
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